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Abstract. Traditional spatial queries return, for a given query object q, all database 
objects that satisfy a given predicate, such as epsilon range and fc-nearest neigh- 
bors. This paper defines and studies inverse spatial queries, which, given a subset 
of database objects Q and a query predicate, return all objects which, if used 
as query objects with the predicate, contain Q in their result. We first show a 
straightforward solution for answering inverse spatial queries for any query pred- 
icate. Then, we propose a filter-and-refinement framework that can be used to 
improve efficiency. We show how to apply this framework on a variety of inverse 
queries, using appropriate space pruning strategies. In particular, we propose so- 
lutions for inverse epsilon range queries, inverse fc-nearest neighbor queries, and 
inverse skyline queries. Our experiments show that our framework is significantly 
more efficient than naive approaches. 



1 Introduction 

Recently, a lot of interest has grown for reverse queries, which take as input an object 
o and find the queries which have o in their result set. A characteristic example is the 
reverse fcNN query [5,9], whose objective is to find the query objects (from a given 
dataset) that have a given input object in their fcNN set. In such an operation the roles 
of the query and data objects are reversed; while the fcNN query finds the data objects 
which are the nearest neighbors of a given query object, the reverse query finds the ob- 
jects which, if used as queries, return a given data object in their result. Besides fcNN 
search, reverse queries have also been studied for other spatial and multidimensional 
search problems, such as top-fc search [10] and dynamic skyline [6]. Reverse queries 
mainly find application in data analysis tasks; e.g., given a product find the customer 
searches that have this product in their result. [5] outlines a wide range of such ap- 
plications (including business impact analysis, referral and recommendation systems, 
maintenance of document repositories). 

In this paper, we generalize the concept of reverse queries. We note that the cur- 
rent definitions take as input a single object. However, similarity queries such as fcNN 
queries and e -range queries may in general return more than one result. Data analysts 



are often interested in the queries that include two or more given objects in their result. 
Such information can be meaningful in applications where only the result of a query can 
be (partially) observed, but the actual query object is not known. For example consider 
an online shop selling a variety of different products stored in a database V. The online 
shop may be interested in offering a package of products Q C V for a special price. 
The problem at hand is to identify customers which are interested in all items of the 
package, in order to direct an advertisement to them. We assume that the preferences of 
registered customers are known. First, we need to define a predicate indicating whether 
a user is interested in a product. A customer may be interested in a product if 

- the distance between the product's features and the customer's preference is less 
than a threshold e. 

- the product is contained in the set of his k favorite items, i.e., the k-set of product 
features closest to the user's preferences. 

- the product is contained in the customer's dynamic skyline, i.e., there is no other 
product that better fits the customer's preferences in every possible way. 

Therefore, we want to identify customers r, such that the query on V with query object 
r, using one of the query predicates above, contains Q in the result set. More specifi- 
cally, consider a set V <G R d as a database of n objects and let d(-) denote the Euclidean 
distance in R d . Let V(q) be a query on V with predicate V and query object q. 

Definition 1. An inverse V query (IT 'Q) computes for a given set of query objects Q C 
V the set of points r G R d for which Q is in the V query result; formally: 

IVQ = {reR d :Q<Z V(r))} 

Simply speaking, the result of the general inverse query is the subset of the space de- 
fined by all objects r for which all Q-objects are in V(r). Special cases of the query 
are 

- The mono-chromatic inverse V Query, for which the result set is a subset of D. 

- The bi-chromatic inverse V Query, for which the result set is a subset of a given 
database V C R d . 

In this paper, we study the inverse versions of three common query types in spatial 
and multimedia databases as follows. 

Inverse e-Range Query (Ie-RQ). The inverse e-Range query returns all objects 
which have a sufficiently low distance to all query objects. For a bi-chromatic sample 
application of this type of query, consider a movie database containing a large num- 
ber of movie records. Each movie record contains features such as humor, suspense, 
romance, etc. Users of the database are represented by the same attributes, describ- 
ing their preferences. We want to create a recommendation system that recommends to 
users movies that are sufficiently similar to their preferences (i.e., distance less than e). 
Now, assume that a group of users, such as a family, want to watch a movie together; 
a bi-chromatic Ie-RQ query will recommend movies which are similar to all members 
of the family. For a mono-chromatic case example, consider the set Q = {q\, q 2 } of 
query objects of Figure 1(a) and the set of database points V = {pi,P2, • • • ,Pe}- If the 




(a) Ie-RQ. (b) Ik-NNQ, k = 

3. 

Fig. 1. Examples of inverse queries. 



range e is as illustrated in the figure, the result of the Ie-RQ(Q) is {p 2 ,P4,P5} (e.g., 
pi is dropped because d(p\, q 2 ) > £)• 

Inverse fc-NN Query (Ik-NNQ). The inverse A:NN query returns the objects which 
have all query points in their fcNN set. For example, mono-chromatic inverse fcNN 
queries can be used to aid crime detection. Assume that a set of households have been 
robbed in short succession and the robber must be found. Assume that the robber will 
only rob houses which are in his close vicinity, e.g. within the closest hundred house- 
holds. Under this assumption, performing an inverse 100NN query, using the set of 
robbed households as Q, returns the set of possible suspects. A mono-chromatic in- 
verse 3NN query for Q — {91,52} in Figure 1(b) returns {p^}. p&, for example, is 
dropped, as q 2 is not contained in the list of its 3 nearest neighbors. 

Inverse Dynamic Skyline Query (I-DSQ). An inverse dynamic skyline query re- 
turns the objects, which have all query objects in their dynamic skyline. A sample ap- 
plication for the general inverse dynamic skyline query is a product recommendation 
problem: Assume there is a company, e.g. a photo camera company, that provides its 
products via an internet portal. The company wants to recommend to their customers 
products by analyzing the web pages visited by them. The score function used by the 
customer to rate the attributes of products is unknown. However, the set of products 
that the customer has clicked on can be seen as samples of products that he or she is 
interested in, and thus, must be in the customers dynamic skyline. The inverse dynamic 
skyline query can be used to narrow the space which the customers preferences are lo- 
cated in. Objects which have all clicked products in their dynamic skyline are likely 
to be interesting to the customer. In Figure 1, assuming that Q = {qi, q 2 } are clicked 
products, I-DSQ(Q) includes pe, since both qi and q 2 are included in the dynamic 
skyline of p 6 . 

For simplicity, we focus on the mono-chromatic cases of the respective query types 
(i.e., query points and objects are taken from the same data set); however, the proposed 
techniques can also be applied for the bi-chromatic (cf. Appendix D) and the general 
case. 

Motivation. A naive way to process any inverse spatial query is to compute the cor- 
responding reverse query for each qi e Q and then intersect these results. The problem 
of this method is that running a reverse query for each qi multiplies the complexity of 
the reverse query by |Q| both in terms of computational and I/O cost. Objects that are 
not shared in two or more reverse queries in Q are unnecessarily retrieved, while objects 



that are shared by two or more queries are redundantly accessed multiple times. We pro- 
pose a filter-refinement framework for inverse queries, which first applies a number of 
filters using the set of query objects Q to prune effectively objects which may not par- 
ticipate in the result. Afterwards candidates are pruned by considering other database 
objects. Finally, during a refinement step, the remaining candidates are verified against 
the inverse query and the results are output. Details of our framework are shown in 
Section 3. When applying our framework to the three inverse queries under study, fil- 
tering and refinement are sometimes integrated in the same algorithm, which performs 
these steps in an iterative manner. Although for Ie-RQ queries the application of our 
framework is straightforward, for Ik-NNQ and I-DSQ, we define and exploit spe- 
cial pruning techniques that are novel compared to the approaches used for solving the 
corresponding reverse queries. 

Outline. The rest of the paper is organized as follows. In the next section we give 
an overview of the previous work which is related to inverse query processing. Section 
3 describes our framework. In Sections 4-6 we implement it on the three inverse spatial 
query types; we first briefly introduce the pruning strategies for the single-query-object 
case and then show how to apply the framework in order to handle the multi-query- 
object case in an efficient way. Section 7 is an experimental evaluation and Section 8 
concludes the paper. 

2 Related Work 

The problem of supporting reverse queries efficiently, i.e. the case where Q only con- 
tains a single database object, has been studied extensively. However, none of the pro- 
posed approaches is directly extendable for the efficient support of inverse queries when 
\Q\ > 1 . First, there exists no related work on reverse queries for the e-range query pred- 
icate. This is not surprising since the the reverse e-range query is equal to a (normal) 
e-range query. However, there exists a large body of work for reverse fc-nearest neigh- 
bor (RfcNN) queries. Self-pruning approaches like the RNN-Tree [5] and the RdNN-tree 
[11] operate on top of a spatial index, like the R-tree. Their objective is to estimate the 
fcNN distance of each index entry e. If the fcNN distance of e is smaller than the dis- 
tance of e to the query q, then e can be pruned. These methods suffer from the high 
materialization and maintenance cost of the fcNN distances. 

Mutual-pruning approaches such as [8, 7, 9] use other points to prune a given index 
entry e. TPL [9] is the most general and efficient approach. It uses an R-tree to compute 
a nearest neighbor ranking of the query point q. The key idea is to iteratively construct 
Voronoi hyper-planes around q using the retrieved neighbors. TPL can be used for in- 
verse fcNN queries where \Q\ > 1, by simply performing a reverse fcNN query for each 
query point and then intersecting the results (i.e., the brute-force approach). 

For reverse dynamic skyline queries, [2] proposed an efficient solution, which first 
performs a filter-step, pruning database objects that are globally dominated by some 
point in the database. For the remaining points, a window query is performed in a re- 
finement step. In addition, [6] gave a solution for reverse dynamic skyline computation 
on uncertain data. None of these methods considers the case of \Q\ > 1, which is the 
focus of our work. 



In [10] the problem of reverse top-fc queries is studied. A reverse top-fc query returns 
for a point q and a positive integer fc, the set of linear preference functions for which q 
is contained in their top-fc result. The authors provide an efficient solution for the 2D 
case and discuss its generalization to the multidimensional case, but do not consider the 
case where \Q\ > 1. Although we do not study inverse top-fc queries in this paper, we 
note that it is an interesting subject for future work. 

3 Inverse Query (IQ) Framework 

Our solutions for the three inverse queries under study are based on a common frame- 
work consisting of the following filter-refinement pipeline: 

Filter 1: Fast Query Based Validation: The first component of the framework, called 
fast query based validation, uses the set of query objects Q only to perform a quick 
check on whether it is possible to have any result at all. In particular, this filter verifies 
simple constraints that are necessary conditions for a non-empty result. For example, 
for the IfcNN case, the result is empty if |Q| > fc. 

Filter 2: Query Based Pruning: Query based pruning again uses the query objects 
only to prune objects in V which may not participate in the result. Unlike the simple 
first filter, here we employ the topology of the query objects. 

Filters 1 and 2 can be performed very fast because they do not involve any database 
object except the query objects. 

Filter 3: Object Based Pruning: This filter, called object based pruning, is more ad- 
vanced because it involves database objects additional to the query objects. The strategy 
is to access database objects in ascending order of their maximum distance to any query 
point; formally: 

MaxDist(o, Q) = max(d(e, q)). 

The rationale for this access order is that, given any query object q, objects that are close 
to q have more pruning power, i.e., they are more likely to prune other objects w.r.t. q 
than objects that are more distant to q. To maximize the pruning power, we prefer to 
examine objects that are close to all query points first. 

Note that the applicability of the filters depends on the query. Query based pruning 
is applicable if the query objects suffice to restrict the search space which holds for 
the inverse e -range query and the inverse skyline query but not directly for the inverse 
fcNN query. In contrast, the object based pruning filter is applicable for queries where 
database objects can be used to prune other objects which for example holds for the 
inverse fcNN query and the inverse skyline query but not for the inverse e-range query. 

Refinement: In the final refinement step, the remaining candidates are verified and the 
true hits are reported as results. 



4 Inverse e -Range Query 



We will start with the simpler query, the inverse e-range query. First, consider the case 
of a query object q (i.e., \Q\ = 1). In this case, the inverse e-range query computes all 
objects, that have q within their e-range sphere. Due to the symmetry of the e-range 
query predicate, all objects satisfying the inverse e-range query predicate are within 
the e-range sphere of q as illustrated in Figure 2(a). In the following, we consider the 
general case, where \Q\ > 1 and show how our framework can be applied. 
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Fig. 2. Pruning space for Ie-RQ. 



4.1 Framework Implementation 

Fast Query Based Validation There is no possible result if there exists a pair q, q' of 
queries in Q, such that their e-ranges do not intersect (i.e., d(q, q') > 2 • e). In this case, 
there can be no object r having both q and q' within its e-range (a necessary condition 
for r to be in the result). 

Query Based Pruning Let Sf C R d be the e-sphere around query point qi for all 
qi G Q, as depicted in the example shown in Figure 2(b). Obviously, any point in 
the intersection region of all spheres, i.e. r\i—\,. m Sf , has all query objects qi € Q in 
its e-range. Consequently, all objects outside of this region can be pruned. However, 
the computation of the search region can become too expensive in an arbitrary high 
dimensional space; thus, we compute the intersection between rectangles that minimally 
bound the hyper-spheres and use it as a filter. This can be done quite efficiently even in 
high dimensional spaces; the resulting filter rectangle is used as a window query and all 
objects in it are passed to the refinement step as candidates. 

Object Based Pruning As mentioned in Section 3 this filter is not applicable for in- 
verse e-range queries, since objects cannot be used to prune other objects. 



Refinement In the refinement step, for all candidates we compute their distances to all 
query points q e Q and report only objects that are within distance e from all query 
objects. 

4.2 Algorithm 

The implementation of our framework above can be easily converted to an algorithm, 
which, after applying the filter steps, performs a window query to retrieve the candi- 
dates, which are finally verified. Search can be facilitated by an R-tree that indexes V. 
Starting from the root, we search the tree, using the filter rectangle. To minimize the I/O 
cost, for each entry P of the tree that intersects the filter rectangle, we compute its dis- 
tance to all points in Q and access the corresponding subtree only if all these distances 
are smaller than e. 

5 Inverse fc-NN Query 

For inverse fc-nearest neighbor queries (Ifc-NNQ), we first consider the case of a single 
query object (i.e., |Q| = 1). As discussed in Section 2, this case can be processed by the 
bi-section-based Rfc-NN approach (TPL) proposed in [9], enhanced by the rectangle- 
based pruning criterion proposed in [3]. The core idea of TPL is to use bi-section- 
hyperplanes between database objects o and the query object q in order to check which 
objects are closer to o than to q. Each bi-section-hyperplane divides the object space 
into two half-spaces, one containing q and one containing o. Any object located in the 
half-space containing o is closer to o than to q. The objects spanning the hyperplanes 
are collected in an iterative way. Each object o is then checked against the resulting 
half-spaces that do not contain q. As soon as o is inside more than k such half-spaces, 
it can be pruned. Next, we consider queries with multiple objects (i.e., |Q| > 1) and 
discuss how the framework presented in Section 3 is implemented in this case. 

5.1 Framework Implementation 

Fast Query Based Validation Recall that this filter uses the set of query objects Q 
only, to perform a quick check on whether the result is empty. Here, we use the obvious 
rule that the result is empty if the number of query objects exceeds the query parameter 
k. 

Query Based Pruning We can exploit the query objects in order to reduce the Ifc- 
NN query to an Ifc'-NN query with k' < k. A smaller query parameter k! allows us 
to terminate the query process earlier and reduce the search space. We first show how 
k can be reduced by means of the query objects only. The proofs for all lemmas are 
presented in Appendix A. 

Lemma 1. Let V C R d be a set of database objects and Q C V be a set of query 
objects. Let V = V — Q. For each o € V, the following statement holds: 

o G Ik-NNQ(Q) inV =>\tq G Q : o G Ik'-NNQ({q}) in V' U {<?}, 

where k' = k — \Q\ + 1. 



Simply speaking, if a candidate object o is not in the Ik'-NNQ({q}) result of some 
q G Q considering only the points V U {q}, then o cannot be in the Ik-NNQ(Q) result 
considering all points in V and o can be pruned. As a consequence, Ik'-NNQ({q}) 
in V U {q} can be used to prune candidates for any q e Q. The pruning power of 
Ik'-NNQdq}) depends on how q € Q is selected. 

From Lemma 1 we can conclude the following: 

Lemma 2. Let o e T> — Q be a database object and q re f E Q be a query object such 
that\fq € Q : d(o 1 q re f) > d(o,q). Then 

o e Ik-NNQ(Q) «oe Ik'-NNQ({q ref }) in V U {q}, 

where k' = k - \Q\ + 1. 

Lemma 2 suggests that for any candidate object o in V, we should use the furthest 
query point to check whether o can be pruned. 

Object Based Pruning Up to now, we only used the query points in order to reduce 
k in the inverse fc-NN query. Now, we will show how to consider database objects in 
order to further decrease k. 

Lemma 3. Let Q be the set of query objects and % C T>—Q be the non-query( database) 
objects covered by the convex hull of Q. Furthermore, let o € V be a database object 
and q re f € Q a query object such that V<? G Q : d(o, q re f) > d(o, q). Then for each 
object p e % it holds that d(o,p) < d(o, q re f )- 

According to the above lemma the following statement holds: 

Lemma 4. Let Q be the set of query objects, % C V — Q be the database (non-query) 
objects covered by the convex hull of Q and let q re f G Q be a query object such that 
Vg € Q : d(o, q r ef) > d(o, q). Then 

VoeV-n-Q-.oe Ik-NNQ(Q) <S> 

at most k' = k — — \Q\ objects p e V — H are closer to o than q re j, and 

VoeH:oE Ik-NNQ(Q) & 

at most k' = k — \T-L\ — \Q\ + 1 objects p G T> — "H are closer to o than q re f. 

Based on Lemma 4, given the number of objects in the convex hull of Q, we can 
prune objects outside of the hull from Ifc-NN(Q). Specifically, for an Ifc-NN query we 
have the following pruning criterion: An object o E V can be pruned, as soon as we find 
more than k' objects p e V — % outside of the convex hull of Q, that are closer to o than 
q re f. Note that the parameter k' is set according to Lemma 4 and depends on whether o 
is in the convex hull of Q or not. Depending on the size of Q and the number of objects 
within the convex hull of Q, k' — k — \U\ + 1 can become negative. In this case, we can 
terminate query evaluation immediately, as no object can qualify the inverse query (i.e., 
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Fig. 3. Ifc-NN pruning based on Lemma 4 



the inverse query result is guaranteed to be empty). The case where k' = k — + 1 
becomes zero is another special case, as all objects outside of H can be pruned. For all 
objects in the convex hull of Q (including all query objects) we have to check whether 
there are objects outside of % that prune them. 

As an example of how Lemma 4 can be used, consider the data shown in Figure 
3 and assume that we wish to perform an inverse 10NN query using a set Q of seven 
query objects, shown as points in the figure; non-query database points are represented 
by stars. In Figure 3(a), the goal is to determine whether candidate object 0\ is a result, 
i.e., whether o\ has all q <G Q in its 10NN set. The query object having the largest 
distance to o\ is q re fi. Since o\ is located outside of the convex hull of Q (i.e, o e 
V — H — Q), the first equivalence of Lemma 4, states that o\ is a result if at most 
k' = k - \H\ - \Q\ = 10 - 4 - 7 = -1 objects in V - H - Q are closer to o x 
than <7 re /i. Thus, o\ can be safely pruned without even considering these objects (since 
obviously, at least zero objects are closer to o\ than q re fi)- Next, we consider object 
02 in Figure 3(b). The query object with the largest distance to o 2 is q re f2- Since 02 is 
inside the convex hull of Q, the second equivalence of Lemma 4 yields that 02 is a result 
if at most k' = k - \H\ - \Q\ + 1 = 10 - 4 - 7 + 1 = objects V - U - Q are closer 
to o 2 than q re f2 - This, o 2 remains a candidate until at least one object in V — % — Q is 
found that is closer to o 2 than g re /2- 

Refinement Each remaining candidate is checked whether it is a result of the inverse 
query by performing a fc-NN search and verifying whether its result includes Q. 

5.2 Algorithm 



We now present a complete algorithm that traverses an aggregate R-tree (ARTree), 
which indexes V and computes Ik-NNQ(Q) for a given set Q of query objects, using 



Lemma 4 to prune the search space. The entries in the tree nodes are augmented with 
the cardinality of objects in the corresponding sub-tree. These counts can be used to 
accelerate search, as we will see later. 

In a nutshell, the algorithm, while traversing the tree, attempts to prune nodes based 
on the lemma using the information known so far about the points of V that are included 
in the convex hull (filtering). The objects that survive the pruning are inserted in the 
candidates set. During the refinement step, for each point c in the candidates set, we run 
a fc-NN query to verify whether c contains Q in its fc-NN set. 

Algorithm 1 in Appendix B is a pseudocode of our approach. The ARTree is tra- 
versed in a best-first search manner [4], prioritizing the access of the nodes according 
to the maximum possible distance (in case of a non-leaf entry we use MinDist) of their 
contents to the query points Q. In specific, for each R-tree entry e we can compute, 
based on its MBR, the furthest possible point q re f in Q to a point indexed under e. 
Processing the entries with the smallest such distances first helps to find points in the 
convex hull of Q earlier, which helps making the pruning bound tighter. 

Thus, initially, we set \H\ = 0, assuming that in the worst case the number of non- 
query points in the convex hull of Q is 0. If the object which is deheaped is inside the 
convex hull, we increase \H\ by one. If a non-leaf entry is deheaped and its MBR is 
contained in the hull, we increase \H\ by the number of objects in the corresponding 
sub-tree, as indicated by its augmented counter. 

During the tree traversal, the accessed tree entries could be in one of the following 
sets (i) the set of candidates, which contains objects that could possibly be results of 
the inverse query, (ii) the set of pruned entries, which contains (pruned) entries whose 
subtrees may not possibly contain inverse query results, and (iii) the set of entries which 
are currently in the priority queue. When an entry e is deheaped, the algorithm checks 
whether it can be pruned. For this purpose, it initializes a prune ^counter which is a 
lower bound of the number of objects that are closer to every point pine than Q's 
furthest point to p. For every entry e' in all three sets (candidates, pruned, and priority 
queue), we increase the prune .counter of e by the number of points in e' if the follow- 
ing condition holds: \/p e e,Vp' e e' : dist(e,e') < dist(e,q re f). This condition can 
efficiently be checked using the technique from [3]. An example were this condition is 
fulfilled is shown in Figure 4. Here the prune^counter of e can be increased by the 
number of points in e'. 

While updating prune.counter for e, we check whether 
prune-counter > k — \H\ — |<5| (prune ^counter > k — \H\ — \Q\ + 1) for en- 
tries that are entirely outside of (intersect) the convex hull. As soon as this condition is 
true, e can be pruned as it cannot contain objects that can participate in the inverse query 
result (according to Lemma 4). Considering again Figure 4 and assuming the number 
of points in e' to be 5, e could be pruned for k < 10 (since prune .counter '(5) > 
fc(10) — |"H|(2) — |Q|(4) holds). In this case e is moved to the set of pruned entries. If 
e survives pruning, the node pointed to by e is visited and its entries are enheaped if e 
is a non-leaf entry; otherwise e is inserted in the candidates set. 

When the queue becomes empty, the filter step of the algorithm completes with a 
set of candidates. For each object c in this set, we check whether c is a result of the 
inverse query by performing a fc-NN search and verifying whether its result includes 
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Fig. 4. Calculating the prune-Count of e 



Q. In our implementation, to make this test faster, we replace the fc-NN search by an 
aggregate e-range query around c, by setting e = d(c, q re f), where q re f is the furthest 
point of Q to p. The objective is to count whether the number of objects in the range is 
greater than k. In this case, we can prune c, otherwise c is a result of the inverse query. 
ARTree is used to process the aggregate e-range query; for every entry e included in 
the e-range, we just increase the aggregate count by the augmented counter to e without 
having to traverse the corresponding subtree. In addition, we perform batch searching 
for candidates that are close to each other, in order to optimize performance. The details 
are skipped due to space constraints. 

6 Inverse Dynamic Skyline Query 

We again first discuss the case of a single query object, which corresponds to the reverse 
dynamic skyline query [6] and then present a solution for the more interesting case 
where \Q\ > 1. Let q be the (single) query object with respect to which we want to 
compute the inverse dynamic skyline. Any object o E T> defines a pruning region, such 
that any object o' in this region cannot be part of the inverse query result. Formally: 

Definition 2 (Pruning Region). Let q = (q 1 , . . . , q d ) E Q be a single d-dimensional 
query object and o = (o 1 , . . . , o d ) E T> be any d-dimensional database object. Then the 
pruning region PR q {6) of o w.r.t. q is defined as the d-dimensional rectangle where the 

ith dimension ofPR q (o) is given by [ q ~t,° , +oo] if q % < o l and [— oo, 9 ] if q l > o\ 

The pruning region of an object o with respect to a single query object q is illustrated 
by the shaded region in Figure 5(a). 

Filter step. As shown in [6], any object pel? can be safely pruned if p is contained 
in the pruning region of some o £ V w.r.t. q (i.e. p e PR q (o)). Accordingly, we can 
use q to divide the space into 2 d partitions by splitting along each dimension at q. Let 
o e V be an object in any partition P; o is an I-DSQ candidate, iff there is no other 
object p e P CP that dominates o w.r.t. q. 




(a) pruning region (b) candidates 

Fig. 5. Single-query case 



Thus, we can derive all I-DSQ candidates as follows: First, we split the data 
space into the 2 d partitions at the query object q as mentioned above. Then in each 
partition, we compute the skyline 3 , as illustrated in the example depicted in Figure 
5(b). The union of the four skylines is the set of the inverse query candidates (e.g., 
{oi, o 2 , o 3 , o 5 , o 6 , o 8 } in our example). 

Refinement. The result of the reverse dynamic skyline query is finally obtained 
by verifying for each candidate c, whether there is an object in V which dominates q 
w.r.t. c. This can be done by checking whether the hypercube centered at c with extent 
2 • I c % — q' l \ at each dimension i is empty. For example, candidate 05 in Figure 5(b) is 
not a result, because the corresponding box (denoted by dashed lines) contains 07. This 
means that in both dimensions 07 is closer to 05 than q is. 

6.1 IQ Framework Implementation 

Fast Query Based Validation Following our framework, first the set Q of query ob- 
jects is used to decide whether it is possible to have any result at all. For this, we use 
the following lemma: 

Lemma 5. Let q e Q be any query object and let S be the set of2 d partitions derived 
from dividing the object space at q along the axes into two halves in each dimension. If 
in each partition r£<S there is at least one query object q' G Q (q' 7^ q), then there 
cannot be any result. 

Query Based Pruning We now propose a filter, which uses the set Q of query objects 
only in order to reduce the space of candidate results. We explore similar strategies as 
the fast query based validation. For any pair of query objects q, q' e Q, we can define 
two pruning regions, according to Definition 2: PR q (q') and PR q >(q). Any object in- 
side these regions cannot be a candidate of the inverse query result because it cannot 
have both qi and q 2 in its dynamic skyline point set. Thus, for every pair of query ob- 
jects, we can determine the corresponding pruning regions and use their union to prune 



3 Only objects within the same partition are considered for the dominance relation. 



objects or R-tree nodes that are contained in it. Figure 6 shows examples of the pruning 
space for | Q | = 3 and \Q\ = 4. Observe that with the increase of \Q\ the remaining 
space, which may contain candidates, becomes very limited. 

The main challenge is how to encode and use the pruning space defined by Q, as 
it can be arbitrarily complex in the multidimensional space. As for the Ik-NNQ case, 
our approach is not to explicitly compute and store the pruning space, but to check 
on-demand whether each object (or R-tree MBR) can be pruned by one or more query 
pairs. This has a complexity of 0(\Q\ 2 ) checks per object. In Appendix C, we show 
how to reduce this complexity for the special 2D case. The techniques shown there can 
also be used in higher dimensional spaces, with lower pruning effect. 
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Fig. 6. Pruning regions of query objects 



Object Based Pruning For any candidate object o that is not pruned during the query- 
based filter step, we need to check if there exists any other database object o' which 
dominates some q e Q with respect to o. If we can find such an o', then o cannot have 
q in its dynamic skyline and thus o can be pruned for the candidate list. 



Refinement In the refinement step, each candidate c is verified by performing a dy- 
namic skyline query using c as query point. The result should contain all % e Q, 
otherwise c is dropped. The refinement step can be improved by the following observa- 
tion (cf. Figure 7): for checking if a candidate o\ has all e Q in its dynamic skyline, 
it suffices to check whether there exists at least one other object Oj G V which prevents 
one qi from being part of the skyline. Such an object has to lie within the MBR defined 
by qi and q[ (which is obtained by reflecting through o\). If no point is within the \Q\ 
MBRs, then o\ is reported as result. 



2 



Fig. 7. Refinement area defined by qi , q*2 and o\ 



6.2 Algorithm 



The algorithm for I-DSQ is shown as Algorithm 2 in Appendix B. During the filter 
steps, the tree is traversed in a best first manner, where entries are accessed by their 
minimal distance (MinDist) to the farthest query object. For each entry e we check if e 
is completely contained in the union of pruning regions defined by all pairs of queries 
iliiQj) <= Q> i- e -> U( 9i qj)eQ PR-n (ij)- I n addition, for each accessed database object 
Oi and each query object qj, the pruning region is extended by PR qj (oi). Analogously 
to the Ik-NN case, lists for the candidates and pruned entries are maintained. The 
pruning conditions of Appendix C are used wherever applicable to reduce the compu- 
tational cost. Finally, the remaining candidates are refined using the refinement strategy 
described in Section 6.1. 



7 Experiments 



For each of the inverse query predicates discussed in the paper, we compare our pro- 
posed solution based on multi-query-filtering (MQF), with a naive approach (Naive) 
and another intuitive approach based on single-query-filtering (SQF). The naive algo- 
rithm (Naive) computes the corresponding reverse query for every q E Q and intersects 
their results iteratively. To be fair, we terminated Naive as soon as the intersection of 
results obtained so far is empty. SQF performs a RfcNN (Re-range / RDS) query us- 
ing one randomly chosen query point as a filter step to obtain candidates. For each 
candidate an e-range (fcNN / DS) query is issued and the candidate is confirmed if all 
query points are contained in the result of the query (refinement step). Since the pages 
accessed by the queries in the refinement step are often redundant, we use a buffer to 
further boost the performance of SQF. We employed i?*-trees ([1]) of pagesize 1Kb to 
index the datasets used in the experiments. For each method, we present the number of 
page accesses and runtime. To give insights into the impact of the different parameters 
on the cardinality of the obtained results we also included this number to the charts. 
In all settings we performed 1000 queries and averaged the results. All methods were 
implemented in Java 1.6 and tests were run on a dual core (3.0 Ghz) workstation with 2 
GB main memory having windows xp as OS. The performance evaluation settings are 
summarized below; the numbers in bold correspond to the default settings: 



parameter 


values 


db size 


100000 (synthetic), 175812 (real) 


dimensionality 


2, 3, 4, 5 


e 


0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 


k 


50, 100, 150, 200, 250 


# inverse queries 


1, 3, 5, 10, 15, 20, 25, 30, 35 


query extent 


0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.0006 



The experiments were performed using several datasets: 

- Synthetic Datasets: Clustered and uniformly distributed objects in (i-dimensional 
space. 

- Real Dataset: Vertices in the Road Network of North America 4 . Contains 175,812 
two-dimensional points. 

The datasets were normalized, such that their minimum bounding box is [0, l] d . For 
each experiment, the query objects Q for the inverse query were chosen randomly from 
the database. Since the number of results highly depends on the distance between in- 
verse query points (in particular for the Ie-RQ and Ik-NNQ) we introduced an addi- 
tional parameter called extent to control the maximal distance between the query ob- 
jects. The value of extent corresponds to the volume (fraction of data space) of a cube 
that minimally bounds all queries. For example in the 3D space the default cube would 
have a side length of 0.073. A small extent assures that the queries are placed close to 
each other generally resulting in more results. In this section, we show the behavior of 
all three algorithms on the uniform datasets only. Experiments on the other datasets can 
be found in Appendix E. 

7.1 Inverse £-Range Queries 

We first compared the algorithms on inverse e range queries. Figure 8(a) shows that 
the relative speed of our approach (MQF) compared to Naive grows significantly with 
increasing e; for Naive, the cardinality of the result set returned by each query depends 
on the space covered by the hypersphere which is in 0(e d ). In contrast, our strategy 
applies spatial pruning early, leading to a low number of page accesses. SQF is faster 
than Naive, but still needs around twice as much page accesses as MQF. MQF performs 
even better with an increasing number of query points in Q (as depicted in Figure 8(b)), 
as in this case the intersection of the ranges becomes smaller. The I/O cost of SQF in 
this case remains almost constant which is mainly due to the use of the buffer which 
lowers the page accesses in the refinement step. Similar results can be observed when 
varying the database size (Figure 8(e)) and query extent (Figure 8(d)). For the data di- 
mensionality experiment (Figure 8(c)) we set epsilon such that the sphere defined by e 
covers always the same percentage of the dataspace, to make sure that we still obtain 
results when increasing the dimensionality (note, however, that the number of results is 



4 Obtained and modified from http://www.cs.fsu.edu/~lifeifei/SpatialDataset.htm. The original 
source is the Digital Chart of the World Server (http://www.maproom.psu.edu/dcw/). 



still unsteady). Increasing dimensionality has a negative effect on performance. How- 
ever MQF copes better with data dimensionality than the other approaches. In a last 
experiment (see Figure 8(f)) we compared the computational costs of the algorithms. 
Even though Inverse Queries are I/O bound, MQF is still preferable for main-memory 
problems. 




7.2 Inverse feNN Queries 

The three approaches for inverse fcNN search show a similar behavior as those for the Ie- 
RQ. Specifically the behavior for varying k (Figure 9(a)) is comparable to varying e and 
increasing the query number (Figure 9(b)) and the query extent (Figure 9(d)) yields the 
expected results. When testing on datasets with different dimensionality, the advantage 
of MQF becomes even more significant when d increases (cf. Figure 9(c)). In contrast to 
the Ie-RQ results for IfcNN queries the page accesses of MQF decrease (see Figure 9(e)) 
when the database size increases (while the performance of SQF still degrades). This 
can be explained by the fact, that the number of pages accessed is strongly correlated 
with the number of obtained results. Since for the Ie-RQ the parameter e remained 
constant, the number of results increased with a larger database. For IfcNN the number 
of results in contrast decreases and so does the number of accessed pages by MQF. As 
in the previous set of experiments MQF has also the lowest runtime (Figure 9(f)). 



7.3 Inverse Dynamic Skyline Queries 

Similar results as for the Ik-NNQ algorithm are obtained for the inverse dynamic sky- 
line queries (I-DSQ). Increasing the number of queries in Q reduces the cost of the 
MQF approach while the costs of the competitors increase. Since the average num- 
ber of results approaches faster than for the other two types of inverse queries we 




choose 4 as the default size of the query set. Note that the number of results for I-DSQ 
intuitively increases exponentially with the dimensionality of the dataset (cf. Figure 
10(b)), thus this value can be much larger for higher dimensional datasets. Increasing 
the distance among the queries does not affect the performance as seen in Figure 10(c); 
regarding the number of results in contrast to inverse range- and fcNN-queries, inverse 
dynamic skyline queries are almost not sensitive to the distance among the query points. 
The rationale is that dynamic skyline queries can have results which are arbitrary far 
away from the query point, thus the same holds for the inverse case. The same effect 
can be seen for increasing database size (cf. Figure 10(d)). The advantage of MQF re- 
mains constant over the other two approaches. Like inverse range- and fcNN-queries, 
I-DSQ are I/O bound (see Figure 10(e)), but MQF is still preferable for main-memory 
problems. 

8 Conclusions 

In this paper we introduced and formalized the problem for inverse query processing. 
We proposed a general framework to such queries using a filter-refinement strategy 
and applied this framework to the problem of answering inverse e-range queries, in- 
verse fcNN queries and inverse dynamic skyline queries. Our experiments show that 
our framework significantly reduces the cost of inverse queries compared to straight- 
forward approaches. In the future, we plan to extend our framework for inverse queries 
with different query predicates, such as top-fc queries. In addition, we will investigate 
inverse query processing in the bi-chromatic case, where queries and objects are taken 
from different datasets. Another interesting extension of inverse queries is to allow the 
user not only to specify objects that have to be in the result, but also objects that must 
not be in the result. 




(a) I/O cost w.r.t. \Q\. 
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Fig. 10. 1-DSQ algorithms on uniform dataset 
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A Proofs of Lemmas 



A.l Proof of Lemma 1 

Proof. By contradiction: Let q e Q such that: 

o Ik'-NNQ({q}) in V U {q}. 

That is, o does not have q as one of its fc'-nearest neighbors in the database V U {q} 
containing all (and only) non-query objects and q. This implies that there exist at least 
k! objects d G V U {q} such that dist(o, d) < dist(o, q). Let q re f e Q be the query 
farthest from o. Thus, for each of the Q — 1 objects q' e Q, q ^ q ref it holds that 
dist(o,q) < dist(o,q re f) and for each of the k' object d it holds that dist(o,o') < 
dist(o, q) < dist(o, q re ^). Thus there exist at least Q — 1 + k' = k objects that are 
closer to o than q re l . This implies that 

q ref <£ kNN(o) in V 

and thus 

o<^Ik-NNQ(Q) inV. 

Therefore, we have shown that 

-.(o G Ik'-NNQdq}) in 2?' U {q}) -.(o e Ik-NNQ(Q) in 2?) 
which is equivalent to 

o G Ik-NNQ(Q) inV^VqeQ :oe Ik'-NNQ{{q}) in 2?' U {<?} 
As a side note: The counter direction <S=: also holds, but is not required for pruning. 
Proof. Assume that 

Vq E Q : o G Ik'-NNQ({q}) in 2?' U {q} 

where fe' = fc - \Q\ + 1. 

Then, for each q e Q there exists at most fe' — 1 objects o' G T> \ Q such that 
dist(o,d) < dist(o,q). In addition, there exist at most |Q| — 1 query objects G 
Q \ {q} such that dist(o, q') < dist(o, q). Thus, there exist at most k' — 1 + |Q| — 1 = 
fc — |Q| + 1 — 1 + |Q| — 1 = k — 1 objects which are closer to o than to q, thus g must 
be a fcNN of o. Since this holds for each geQwe get 

o G Ik-NNQ(Q) in 2? 

A.2 Proof of Lemma 2 

Proo/ Due to Lemma 1 we have 

o G Ik-NNQ(Q) in 2? Vg G Q : o G Ifc'-AWQ({ g }) in 2?' U {q}, 
Again, let q re f be the query point with the largest distance to o. Since q re * G Q we get: 
V<? G Q : o e 77c'-iV7VQ({q}) m U {q} 
o G /fc'-A^iVO({g re/ }) in 2?' U {g re/ } 



Fig. 11. Illustration of Corollary 1 



A.3 Proof of Lemma 3 

We first require the following corollary: 

Corollary 1. Let Q e M d be a set of points and Q' C Q be the vertices of the convex 
hull ofQ in R d . Then, for each point o £ R d , the farthest point in Q to o must be in Q' 
as well. 

Proof. Consider any point pel 1 ' and its farthest point q e Q. Then all points in Q 
must be located in the hyper-sphere centered at o with radius d(o, q). Now, we can proof 
the above lemma by contradiction assuming that q is not a convex-hull vertex. If q is 
assumed to be within the convex-hull (not lying on the margin of the convex hull), then 
the hyper-sphere splits the convex-hull into points that are inside the sphere and points 
that are out-side of the sphere as shown for q 2 in Figure 1 1 . Consequently, the convex 
hull contains points that are farther from o than q which contradicts the assumption. 
Now, we assume that q lies on the margin (but not on a vertex) of the convex hull which 
corresponds to a region of a hyper-plane like q 3 in our example. If we move along this 
hyper-plane starting from q, we are still within the convex-hull but leave the hyper- 
sphere of o. Consequently, again, the convex hull contains points that are farther from o 
than q which again contradicts the assumption. 

Now we can use Corollary 1 to prove Lemma 3: 
Proof. By definition of q re f it holds that 

q ref = argmax qeQ (dist(o, q)) 

Since the vertices of the convex hull of Q consists only of points in Q, Corollary 1 leads 
to 

q re f = argmax c& c(dist(o, c)) 

Thus, 

Vc e C : dist(o, q re f) > dist(o, c) 

and since W C C: 

Vp e W : dist(o, q re f) > dist(o,p) 



A.4 Proof of Lemma 4 

Proof. =X If o £ Ik-NNQ(Q), then all query points (including q re f) are in the fcNN 
set of o. Since for all points p in W, d(o,p) < d(o, q re t) (see Lemma 3), all points in 
H should also be in the fcNN set of o. Therefore, in the (worst) case, where q re f is the 
fc-th NN of o, there can be fc — \ W\ — \Q \ points outside the convex hull closer to o than 
q re f, ifo$H,atk-\H\ + l- \Q\ points if o € H. 

-4=: If o is outside the hull, from the points in H U Q, q re f is the furthest one to o 
(see Lemma 3). If there are at most fc — \H\ — \Q\ points outside the hull closer to o 
than q re f is, then the distance ranking of q re f is at most fc. Since all other points in Q 
are closer to o than q re f is, it should be o g Ik-N NQ(Q). If o G H, the bound should 
befc — \H\ — \Q\ + 1, as o should be excluded from H in the proof. 

A.5 Proof of Lemma 5 

Proof. Let us consider the space partitioning S derived from dividing the object space 
at q. Each q' located within partition r € S generates a pruning region PR q (q') (cf. 
Definition 2) that totally covers the partition r' € S which is opposite to r w.r.t. q. 
Since we assume that we have at least one query object q' ^ q in each partition r € 
S, all partitions r' £ S are totally covered by a pruning region and, consequently, 
the complete data space can be pruned. An example in the two-dimensional space is 
illustrated in Figure 12. 
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Fig. 12. Fast query based validation filter in 2D-space 



B Algorithms 

In this section we illustrate the pseudo code of the IfcNNQ (cf. Algorithm 1) and the 
IDSQ (cf. Algorithm 2) algorithm. A more detailed explanation is given in Section 5.2 
and 6.2 respectively. 



Algorithm 1 Inverse kNNQuery 

Require: Q, k, ARTree 
1 : //Fast Query Based Validation 
2: if \Q\ > fcthen 

3: return "no result" and terminate algorithm 
4: end if 

5: pq PriorityQueue ordered by max qi gQMinDist 
6: pq.add{ ART ree. root entries) 
7: \H\ = 

8: LIST candidates, prunedEntries 

9: //Query/Object Based Pruning 
10: while -^pq.isEmptyQ do 
11: e — pq.pollQ 

12: if getPruneCount(e,Q, candidates, prunedEntries, pq) > k — \H\ — \Q\ then 
13: prunedEntries. add(e) 
14: else if e.isLeaf Entry () then 
15: candidates .addie) 
16: else 

17: pq.add(e.getChildrenQ) 
18: end if 

19: it e € convex Hull (Q) then 

20: = e.aggjzount 

21: end if 

22: end while 

23: //Refinement Step 

24: LIST result 

25: for c € candidates do 

26: if q re / € knnQuery(c,k) then 

27: result. add(c) 

28: end if 

29: end for 

30: return (result) 



Algorithm 2 Inverse Dynamic Skyline Query 



Require: Q, ARTree 
1: pq PriorityQueue ordered by min qi£ QMaxDist 
2: pq.add( ART ree. root entries) 
3: LIST candidates, prunedEntries 
4: //Filter step 
5: while -<pq. is Empty () do 



e = pq.pollQ 

if canBePruned(e, Q, candidates, prunedEntries, pq) then 

prunedEntries. add(e) 
else if e.isLeaf Entry () then 

candidates. add(e) 
else 

pq.add(e.getChildrenQ) 
end if 
end while 



6: 

7: 

8: 

9: 
10: 
11: 
12: 
13: 
14: 

15: //Refinement Step 

16: LIST result 

17: for c G candidates do 

18: if Q € dynamics kyline(c) then 

19: result. add(c) 

20: end if 

21: end for 

22: return (result) 



C Pruning Candidates in Inverse Dynamic Skyline Queries 



Here, we show how pruning a data object from the candidates set of an inverse skyline 
query can be accelerated. The pruning conditions discussed here hold for the special 
2D case. Nonetheless some of them can be extended to spaces of higher dimensionality, 
with lower pruning effectiveness. 
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Fig. 13. Using query points at the border of Q D to prune space 



C.l Query-Based Pruning 

Let <5 D be the rectangle that minimally bounds all query objects. The main idea is to 
identify pruning regions by just considering (5 D and one query object q located on the 
boundary of Q a . We first concentrate on pruning objects outside Q a , the cases for 
pruning objects inside of Q D will be discussed later. 

Pruning Condition I: Assume that q is located at one corner of Q n , then the axis- 
aligned region outside of Q n where the ith dimension is defined by the interval [c\ +00] 
if q l > c l and [—00, c l ] otherwise (where c is the center of Q a ) can be pruned. The 
rationale is that since Q D is an MBR, at least one other query object must be at each 
edge of <5 D located on the opposite side of q which can be used to create a pruning 
region at the side of q. In the example shown in Figure 13(a), q is located at the lower 
right corner of <3 D and, therefore, additional query objects must be located on both the 
left and upper edge of Q D . As a consequence, any object in the lower-right shaded 
region can be pruned. 

Pruning Condition II: In the case, where q is located at a boundary of Q a , but not 
at a corner of Q a , the half-space constructed by splitting the data space along the edge 
containing q and does not contain Q D defines the pruning region. Here, the rationale 
is that since q is on an edge e (but not at a corner) of Q D , there must be at least two 
additional query objects, located at the edges adjacent to e, respectively. By pairing 
q with each of these two objects, and merging the corresponding pruning regions, we 
obtain the pruning region. For example, in Figure 13(b), for q, we get the shaded pruning 
region below q. As another example, consider the union of PQ qi ((74) and PQ Q3 ((74) in 
Figure 6(b) which prunes the whole hyperplane below 54. Thus, if all four edges of 
Q D contain four different query objects, then only objects in Q n are candidate I-DSQ 
results. 

C.2 Object-Based Pruning 

For any candidate object o that is not pruned during the query-based filter step, we need 
to check if there exists any other database object d which dominates some q £ Q with 
respect to o. If we can find such an o', then o cannot have q in its dynamic skyline and 
thus o can be pruned for the candidate list. Naively, we can determine, for each database 
object o' that we have found so far and each query object q, the pruning region PR q (o') 
according to Definition 2 and check, if o is located in this region. In the following, 
we show how to perform this pruning without considering all possible combinations of 
database and query objects. 

Consider two query points qi and q 2 and the rectangle Q^ iq2 that minimally bounds 
qi and q 2 - Note, that Q^ iq2 is fully contained in Q D and the union of all Q^ iq . for each 
pair qi,qj £ Q,i 7^ j is equal to Q a . Consider the axis-aligned space partitioning 
according to the center point c of Q^ iq2 resulting into four partitions, denoted as NE, 
SE, SW and NW as illustrated in Figure 14. According to the query-based pruning, the 
two partitions containing qi and q 2 respectively, can be pruned. Now, we can show the 
following: 

Corollary 2. In each of the two remaining regions (NE and SW in the example) within 
QB iq2 , there can only be at most one candidate. 




Fig. 14. Object based pruning inside Q 



To prove this, let us consider the following two cases illustrated in Figures 14(a) and 
14(b): 

Case 1: Let us assume that there are two objects o\ and o 2 in one of the regions, that 
have the same topology as the two query objects qi and q 2 , as shown in Figure 14(a). 
In this case both objects prune each other. The reason is that 0\ is in the pruning region 
of o 2 w.r.t. q 2 and o 2 is in the pruning region of o\ w.r.t. q\. Consequently, there cannot 
exist two candidates in this partition where the objects have similar spatial relationship 
as qi and q 2 . 

Case 2: Now, we assume that there exist two objects o\ and o 2 in one partition 
within Q^ iq2 such that o\o 2 is orienter perpendicularly to qiq 2 , as illustrated in the 
example in Figure 6(b). In this case, 0\ is pruned by o 2 for both query objects. The 
reason is that in each dimension, the distance between o\ and o 2 must be less than the 
distance between o\ and q\ (q 2 ). However, o 2 can in general not be pruned by o\, thus 
o 2 remains a candidate. 

We can now use Corollary 2 to obtain the following pruning condition: 

Pruning Condition III: Let R C Q a be a region inside the query rectangle that 
cannot be pruned using query-based pruning. Let qi , q 2 € Q be two query points for 
which it holds that R is fully contained in the rectangle minimally bounding q\ and 
q 2 . Since R cannot be pruned based on query -pruning only, R must be located in non- 
pruning regions (e.g. NE and SW in Figure 6(a)) of qi and q 2 . Without loss of generality, 
let us assume that R is located in the NE region of Q^ iq2 - Now Let O be the set of 
database objects inside R. Let a e O be the object with the largest x coordinate and let 
b e O be the object with the largest y coordinate. If a ^ b we can prune R. If a = b, 
then a is a candidate and all other objects c G R,c^= a can be pruned. 

The above pruning condition allows us to prune objects inside Q n . The next pruning 
condition allows us to prune objects outside Q D using database objects. In the follow- 
ing, let Q^.max and Q^.min denote the maximal and minimal coordinate of Q a , at 
dimension i, respectively. 

Pruning Condition IV: For the next pruning condition we use the sets of database 
objects Oi C V outside Q D for which it holds that each o € O, intersects Q D in one 
dimension but not in the other. Such objects are shown in Figure 15. Let Of (Or) 
denote the subset of Oi so that each object o e 0+ has a larger (smaller) coordinate 
than Q a in the other dimension. Now let o\ e Ol[ (o~ G O r ) be the object in 0+ 



Fig. 15. Pruning regions outside of Q . 



(O i ) with the smallest (largest) coordinate in the other dimension j. Any database 

object which has a j coordinate greater than ° { + ^ ' max or less than °' +( ^ " mm can 
be pruned. The rationale of this pruning condition is that due the MBR property of 
Q a , we know that at least one query object must be located on each edge of Q a . The 
pruning regions defined are based on these query objects. For instance, for object of in 
Figure 15 we can exploit that there must be query objects qi and q^ located on the left 
and the right border edge of Q a , respectively. This allows us to create the two pruning 
regions PR qi (of) and PR q2 (of) according to Definition 2. These pruning regions are 
smallest, if qi and qi are located at the upper corners of Q u . Thus, we can prune any 
object above the line biscecting the upper side of Q n and of. 

D Pruning Techniques For The Bi- Chromatic Case 

Here, we explain how the proposed pruning techniques which are designed for the 
mono-chromatic case can easily be adapted to the bi-chromatic case. Here we assume 
two data sets T> and V . A bi-chromatic inverse query returns the set of objects r £ D' 
for which each inverse query object q e Q C V'— is contained in the result of a V 
query applied on data set V using r as query object. For each case of the predicate V, 
we briefly explain the changes to our technique. 

Bi-Chromatic Ie-Range Queries Here, the filter rectangle can directly be applied to 
V, so there is no practical change. 

Bi-Chromatic IkNN Queries In this case we have to avoid allowing objects in V 
to prune each other. In contrast to the mono-chromatic case, we only have to consider 
objects in V to build H, i.e. the number of objects in the convex hull regions of Q. 

Bi-Chromatic Inverse Dynamic Skyline Queries In this case, the pruning region is 
only defined by objects in V. This pruning region is used to prune objects in V . 

E Additional Experiments 

In this section we show the behavior of inverse queries on other datasets than the ones 
used in section 7. We excluded from the evaluation the Naive approach due to its poor 



performance; this way, the difference between MQF and SQF becomes more clear. Due 
to space limitations we focused on IfcNN queries on the clustered dataset (cf. Figure 16) 
and the real dataset (cf. Figure 17). Let us note that similar trends could be observed for 
the other inverese query types. 




(a) I/O cost w.r.t. k. (b) I/O w.r.t. \Q\. 
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Fig. 16. Ik-NNQ algorithms on clustered dataset 




10 15 20 25 30 35 
U query points 



(a) I/O cost w.r.t. k. 



(b) I/O w.r.t. |Q|. 
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(e) CPU cost w.r.t. \Q\. 
Fig. 17. Ik-NNQ algorithms on real dataset 



